La regresión logística es usada en los casos donde queremos clasificar observaciones en alguna de dos categorías.
El clasificador funciona tomando aquellas observación para las cuales la transformación logística dada por
\[logit(p_i)= \log\left(\frac{p_i}{1-p_i}\right)=\beta_0+\beta_1x_{1,i}+...+\beta_{k,i}\]
sean mayor a 0.5 y menor a 0.5, por lo que todo elemento mayor a a este valor pertenece a una categoría y si es menor pertenece a la otra categoría.
library(kableExtra)
## Warning: package 'kableExtra' was built under R version 3.5.2
base <- read.csv("../Bases de datos/boston-housing-logistic.csv")
kable(head(base), "markdown")
NOX | DIS | RAD | TAX | PTRATIO | B | CLASS |
---|---|---|---|---|---|---|
0.538 | 4.0900 | 1 | 296 | 15.3 | 396.90 | 1 |
0.469 | 4.9671 | 2 | 242 | 17.8 | 392.83 | 1 |
0.458 | 6.0622 | 3 | 222 | 18.7 | 394.63 | 1 |
0.458 | 6.0622 | 3 | 222 | 18.7 | 396.90 | 1 |
0.458 | 6.0622 | 3 | 222 | 18.7 | 394.12 | 1 |
0.524 | 5.9505 | 5 | 311 | 15.2 | 396.90 | 1 |
base$CLASS <- as.factor(base$CLASS)
modelo <- glm(CLASS~., base, family = binomial)
summary(modelo)
##
## Call:
## glm(formula = CLASS ~ ., family = binomial, data = base)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.19892 -0.44095 0.09147 0.38909 3.03467
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 27.282635 3.523725 7.743 9.74e-15 ***
## NOX -24.051882 4.271615 -5.631 1.80e-08 ***
## DIS -0.485594 0.132250 -3.672 0.000241 ***
## RAD 0.223504 0.062746 3.562 0.000368 ***
## TAX -0.007467 0.003261 -2.290 0.022030 *
## PTRATIO -0.747253 0.114739 -6.513 7.39e-11 ***
## B 0.007702 0.002775 2.776 0.005509 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 502.43 on 362 degrees of freedom
## Residual deviance: 223.48 on 356 degrees of freedom
## AIC: 237.48
##
## Number of Fisher Scoring iterations: 6